Abstract: The recent explosive growth of the measure of content on the Internet has made it progressively troublesome for clients to discover and use information and content providers find it difficult to classify and catalog documents. It gets highly tedious for users to browse with traditional web search engines as they often return hundreds or a great many results for a search. On-line libraries, web search engines, and other large document repositories (e.g. customer support databases, product specification databases, press release archives, news story archives, etc.) are growing so quickly that it is troublesome and exorbitant to classify each record physically. Keeping in mind the end goal to manage these issues, analysts look toward automated methods of working with web documents so they can be all the more effectively browsed, sorted out, and indexed with negligible human intervention. This paper throws light on the web mining concept and its techniques, explains the data mining process in addition to introducing the classification Adaboost Algorithm. The paper takes a step ahead in this direction and proposes an enhanced Adaboost Algorithm. Both the algorithms have been simulated and their results have been compared in terms of accuracy rate. The results show that Enhanced version shows better performance and accuracy than the AdaBoost Algorithm.
Keywords: Web data mining, information retrieval, Web usage mining, Pre-processing, Pattern Analysis, Content Mining; Structure Mining, Classification.